- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-124165: Optimize keep_top_bit() in heapq module by using _Py_bit_length() #124166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
129bbea    to
    dd1075d      
    Compare
  
    dd1075d    to
    8014255      
    Compare
  
            
          
                Include/internal/pycore_bitutils.h
              
                Outdated
          
        
      | else { | ||
| return 0; | ||
| } | ||
| #elif defined(_MSC_VER) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_BitScanReverse64() is not available on 32-bit Windows (x86, ARM). You should check if you're running 64-bit Windows.
8014255    to
    2eb9e24      
    Compare
  
    | I noticed the warnings in the CI tests on Ubuntu related to unsigned/signed integer conversions. However, it seems most of them are not caused by my code changes. In the keep_top_bit function, I did cast the variable n, which is of type Py_ssize_t, to uint64_t. But since n will never be a negative integer, I believe we are safe. | 
        
          
                Include/internal/pycore_bitutils.h
              
                Outdated
          
        
      | } | ||
| #elif defined(_MSC_VER) && defined(_WIN64) | ||
| // _BitScanReverse64() is documented to search 64 bits. | ||
| Py_BUILD_ASSERT(sizeof(uint64_t) <= 8); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this assertion, I don't see its purpose.
| msb += BIT_LENGTH_TABLE[x]; | ||
| return msb; | ||
| #endif | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add tests for this function in Modules/_testinternalcapi.c? Copy check_bit_length() and test_bit_length() to make 64-bit variant and add your test_bit_length64() to module_functions.
Introduce a new function _Py_bit_length64() to handle 64-bit integers efficiently. This function utilizes __builtin_clzll() for GCC/Clang and _BitScanReverse64() for MSVC, ensuring compatibility across different platforms. For compilers that do not support these intrinsics, a fallback implementation using a lookup table is provided. This addition is necessary for safely computing the bit length of 64-bit integers, which is particularly important when working with types like Py_ssize_t on platforms where size_t is 64-bit, such as Windows.
Introduce a new test function, test_bit_length64(), which verifies the correctness of _Py_bit_length64().
…_bit_length() Optimize the keep_top_bit() function to use _Py_bit_length() instead of manually shifting bits. This allows for more efficient execution on hardware that supports specialized instructions such as bsr on x86. Additionally, the function is now marked as inline, as it is small and only used in one location, improving the potential for inlining by the compiler.
2eb9e24    to
    b62ecdb      
    Compare
  
    | Thanks for the suggestion; however, the  Also the code for | 
| Sad to hear that. | 
| It wasn't "unnecessary noise". Your suggestions are always welcome. This particular case wasn't wrong. It was just too microscopic to be worth having an external code path. | 
Optimize the keep_top_bit() function to use _Py_bit_length() instead of manually shifting bits. This allows for more efficient execution on hardware that supports specialized instructions such as bsr on x86. Additionally, the function is now marked as inline, as it is small and only used in one location, improving the potential for inlining by the compiler.